Automatic acquisition of “noun+verb” idiomatic compounds in Korean*14

نویسنده

  • Sanghoun Song
چکیده

Song, Sanghoun. 2015. Automatic acquisition of “noun+verb” idiomatic compounds in Korean. Linguistic Research 32(1), 253-280. The state-of-the-art skills of computational linguistics pay attention to lexical semantics, because it has a potential to be used to improve language processing systems in terms of coverage as well as accuracy. In particular, utilizing multiword expressions is importantly regarded as one of the components to foster performance of language applications. Handling these expressions is particularly crucial in multilingual processing, such as machine translation. Amongst a variety of multiword expressions, the present study investigates “noun+verb” idiomatic compounds in Korean. These compounds are made up of a verb plus the verb’s syntactic object, and what the combination of the two words conveys is not equivalent to the sum of the meanings of the parts. In order to acquire the “noun+verb” idiomatic compounds in Korean in a fully automatic way, the current work exploits a syntax-annotated corpus (i.e. treebank) and three lexical hierarchies in Korean. The current work extracts the syntactic patterns from the development corpus (the Sejong Korean Treebank), calculates the selectional preferences each verbal item has with its objects, and identifies the idiosyncratic items with reference to the three lexical hierarchies (CoreNet, KorLex, and U-WIN). The result includes 548 idiomatic compounds, 70% of which are evaluated as satisfactory. (Nanyang Technological University)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Object and Action Naming: A Study on Persian-Speaking Children

Objectives: Nouns and verbs are the central conceptual linguistic units of language acquisition in all human languages. While the noun-bias hypothesis claims that nouns have a privilege in children’s lexical development across languages, studies on Mandarin and Korean and other languages have challenged this view. More recent cross-linguistic naming studies on children in German, Turkish,...

متن کامل

Automatic Acquisition of Knowledge About Multiword Predicates

Human interpretation of natural language relies heavily on cognitive processes involving metaphorical and idiomatic meanings. One area of computational linguistics in which such processes play an important, but largely unaddressed, role is the determination of the properties of multiword predicates (MWPs). MWPs such as give a groan and cut taxes involve metaphorical meaning extensions of highly...

متن کامل

A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds

We present an annotation study on a representative dataset of literal and idiomatic uses of infinitive-verb compounds in German newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus res...

متن کامل

The VNC-Tokens Dataset

Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb–noun combination usages annotated as to whether they are literal or idiomatic. Previous research using...

متن کامل

A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level

Most of the research on the extraction of idiomatic multiword expressions (MWEs) focused on the acquisition of MWE types. In the present work we investigate whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not. Inspired by the dataset provided by (Cook et al., 2008), we manually analysed 9,700 instances of potentially idiomatic preposit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015